[R3]: Move to new vLLM routed experts format by S1ro1 · Pull Request #2487 · PrimeIntellect-ai/prime-rl

S1ro1 · 2026-05-13T11:59:12Z

PR is ready - routed-experts vLLM pin rebuilt from v0.21.0 + revert #42434 + PR39568

This branch moves prime-rl to the routed-experts response path used by current renderers/verifiers main, without the verifier-side renderer bypass. The x86_64 vLLM pin now points at a PrimeIntellect v0.5.0 release asset built from upstream vLLM releases/v0.21.0 plus upstream revert vllm-project/vllm#42434 and upstream PR vllm-project/vllm#39568.

The previous PR39568-only wheel still contained the PR39917 validation that rejected enable_return_routed_experts with async scheduling during inference startup. This new wheel removes that validation via revert #42434 while keeping the PR39568 routed-experts transfer path.

keep routed-experts data opaque through verifiers during token truncation; prime-rl decodes at the orchestrator boundary
preserve this branch's routed-experts source of truth: RoutedExperts(data, shape, dtype), explicit dtype maps, and _pack_routed_experts / _unpack_routed_experts for multi-turn stitching
update trainer packing/loading to slice, append, pad, and reconstruct the RoutedExperts transport struct with torch.frombuffer
pin vllm-router to 0.1.25 for the matching raw-uint8 schema and add pybase64
pin x86_64 vLLM to vllm-0.21.0+cu129.r42434.pr39568.a106aa6-cp38-abi3-manylinux_2_24_x86_64.whl
update deps/renderers to main 3ae276c (renderers-v0.1.8.dev25), including routed-experts sidecar parsing and fastokens>=0.2.0
update deps/verifiers to main 521d436c (v0.1.15.dev8-1-g521d436c), including the routed-experts response sidecar support and verifier helper cleanup
add the fastokens exclude-newer exemption and lock fastokens==0.2.0 to satisfy current renderers main
disable vLLM async scheduling only for the NIXL routed-experts capture path, where async scheduling leaves placeholder sampled-token state during capture
fix rendered multi-node orchestrator args to use the student client config keys

Related PRs

Router payload format: Feat: downstream perf improvements - bump .25 router#34
Router release trigger: release: v0.1.25 router#35
Renderers routed-experts sidecar parse: Handle routed experts as response sidecar renderers#54
Verifiers routed-experts response sidecar: Support routed experts response sidecar verifiers#1423
Verifiers cleanup against main: Clean routed experts response path verifiers#1433
vLLM routed-experts upstream: [RFC] Replace shared-memory routed experts with ModelRunnerOutput transfer and HTTP support vllm-project/vllm#39568
vLLM PR39917 revert: Revert "[Core] Replace routing replay with device cache and async D2H pipeline" (#39917) vllm-project/vllm#42434

Verification

uv sync --all-extras --locked
uv lock --locked
uv run ruff check .
uv run ruff format --check --config=pyproject.toml
PYTEST_OUTPUT_DIR=/tmp/outputs uv run pytest tests/unit -m "not gpu" - 454 passed, 65 deselected, 35 warnings
Wheel smoke import: vllm.__version__ == "0.21.0+cu129.r42434.pr39568.a106aa6", ModelRunnerOutput.routed_experts present, _validate_return_routed_experts absent, and the old async-scheduling validation string absent.
Uploaded the new r42434.pr39568.a106aa6 vLLM wheel to the prime-rl v0.5.0 release and removed the stale PR39568-only wheel asset that failed inference startup.

Note

Medium Risk
Medium risk because this changes the routed_experts wire/transport format end-to-end (inference response → orchestrator → trainer) and adds a vLLM monkey-patch for NIXL disaggregated inference, which could break router-replay or inference compatibility if assumptions drift.

Overview
Updates prime-rl to the new vLLM routed-experts schema by exporting routing decisions as a compact base64-encoded uint8 byte payload (plus shape) from /inference/v1/generate, and decoding/packing it at the orchestrator boundary instead of propagating nested Python lists.

Introduces a RoutedExperts msgpack transport struct and refactors trainer batching/packing to slice/append/pad routed-experts using raw bytes, reconstructing tensors via torch.frombuffer. Adds config validation to reject router replay when inference.kv_cache_offload is enabled, and includes a vLLM __post_init__ monkey-patch to allow routed-experts capture with the NixlConnector in P/D disaggregated inference.

Bumps dependency pins to match the new protocol (vllm-router 0.1.25, x86_64 vLLM wheel), adds pybase64, and updates/extends unit tests to cover the new serialization and transport behavior.

^{Reviewed by Cursor Bugbot for commit c3ffa15. Bugbot is set up for automated code reviews on this repo. Configure here.}

* Guard checkpoint disk metrics mkdir * Remove test_trainer_utils.py per review feedback Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Simplify ckpt disk metrics guard Drop the rank-0 gate and the disk_usage path fallback per review feedback. Catching FileExistsError on mkdir is sufficient: every rank that races on mkdir either wins or harmlessly catches the BeegFS race, and shutil.disk_usage can then operate on the now-existing ckpt_dir. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…erts # Conflicts: # pyproject.toml # src/prime_rl/inference/patches.py # src/prime_rl/inference/vllm/serving_chat_with_tokens.py # src/prime_rl/inference/vllm/serving_tokens.py # src/prime_rl/orchestrator/trajectories.py # src/prime_rl/trainer/batch.py # src/prime_rl/trainer/rl/data.py # tests/unit/orchestrator/test_batch.py # uv.lock

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit fbf16de. Configure here.}

S1ro1 added 2 commits May 13, 2026 17:41

feat: wire r3 v3 routed experts stack

4895316

feat: reset routing caches on policy update

721a874

S1ro1 force-pushed the feat/r3-v3-routed-experts branch from bf79561 to 721a874 Compare May 13, 2026 12:13

S1ro1 added 3 commits May 13, 2026 18:02

fix: rely on native vllm routed experts

baa6935

fix: pin routed experts dependencies for ci

18e9a7a

fix: update scheduler tests for prefix reset

90d2e3a

S1ro1 force-pushed the feat/r3-v3-routed-experts branch from bc91c30 to e55328f Compare May 14, 2026 14:09

fix: clean routed experts replay integration

1fea38e

S1ro1 force-pushed the feat/r3-v3-routed-experts branch from e55328f to 1fea38e Compare May 14, 2026 14:13

S1ro1 added 3 commits May 14, 2026 20:39

fix: keep routed experts transport first class

2c019e1

fix: keep routed experts on samples

803b4ae

fix: use upstream vllm nightly wheel

9092eca

S1ro1 marked this pull request as ready for review May 14, 2026 15:52

S1ro1 and others added 2 commits May 14, 2026 21:23

fix: pin latest routed experts verifiers

f49caec

Merge branch 'main' into feat/r3-v3-routed-experts

9438623

cursor Bot reviewed May 14, 2026

View reviewed changes

Comment thread src/prime_rl/orchestrator/trajectories.py

S1ro1 added 3 commits May 15, 2026 00:28

fix: pin routed experts dependencies

61a0388

fix: allow routed experts with nixl

094d233

style: format nixl patch

d6d06b4

cursor Bot reviewed May 14, 2026

View reviewed changes

Comment thread src/prime_rl/trainer/rl/data.py

samsja previously approved these changes May 15, 2026

View reviewed changes

Use raw uint8 routed experts payloads

9317cef

S1ro1 dismissed samsja’s stale review via 9317cef May 15, 2026 21:20

cursor Bot reviewed May 15, 2026

View reviewed changes

Comment thread pyproject.toml Outdated

S1ro1 and others added 6 commits May 16, 2026 03:08

Remove unrelated rlm-swe dependency

3cb8345

Pin vllm-router 0.1.25 wheel

66a2984

Keep verifiers routed experts opaque

777aae7

Forward renderer thinking preservation config

a74e7f5

Avoid duplicate routed experts in token responses

a723ac0

S1ro1 added 2 commits May 19, 2026 19:46

Pin verifiers routed experts sidecar

e2cffa1

Pin cleaned verifiers routed experts handling

0edc0c5

S1ro1 force-pushed the feat/r3-v3-routed-experts branch 2 times, most recently from 64d3f2c to cb3c559 Compare May 19, 2026 15:46

Pin rebased verifiers routed experts handling

4402d7e

S1ro1 force-pushed the feat/r3-v3-routed-experts branch from cb3c559 to 4402d7e Compare May 19, 2026 15:53

S1ro1 added 2 commits May 21, 2026 20:45

fix: remove unrelated prime-rl changes

7076bb1

cursor Bot reviewed May 21, 2026

View reviewed changes

Comment thread src/prime_rl/trainer/batch.py

S1ro1 force-pushed the feat/r3-v3-routed-experts branch from 11fe0ad to de71036 Compare May 21, 2026 21:31

Merge branch 'main' into feat/r3-v3-routed-experts

aa1fc36

cursor Bot reviewed May 21, 2026

View reviewed changes

Comment thread src/prime_rl/orchestrator/trajectories.py Outdated

S1ro1 and others added 5 commits May 22, 2026 05:07

fix: pack routed experts as typed payloads

62cc96b

refactor: inline routed experts trajectory packing

ae6b8b3

fix: restore trajectory tokenization helpers

6de7fd1

refactor: simplify routed experts packing

f50ad90

Merge branch 'main' into feat/r3-v3-routed-experts

48fc98c

S1ro1 changed the title ~~feat: wire r3 v3 routed experts replay~~ [R3]: Move to new vLLM routed experts format May 22, 2026

S1ro1 and others added 3 commits May 21, 2026 19:09

Merge branch 'main' into feat/r3-v3-routed-experts

c13b0b3

chore: pin vllm router wheel

97c65a2

Merge branch 'main' into feat/r3-v3-routed-experts

17c1508

cursor Bot reviewed May 22, 2026

View reviewed changes

Comment thread src/prime_rl/trainer/batch.py

chore: update renderers and verifiers

fbf16de

cursor Bot reviewed May 22, 2026

View reviewed changes

Comment thread src/prime_rl/orchestrator/trajectories.py

Pin vLLM PR39568 backport wheel

2630ce9

mikasenghaas previously approved these changes May 22, 2026

View reviewed changes

Pin vLLM revert42434 PR39568 wheel

c3ffa15

S1ro1 dismissed mikasenghaas’s stale review via c3ffa15 May 22, 2026 23:18

samsja approved these changes May 22, 2026

View reviewed changes

S1ro1 merged commit 1e0fe96 into main May 22, 2026
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R3]: Move to new vLLM routed experts format#2487

[R3]: Move to new vLLM routed experts format#2487
S1ro1 merged 38 commits into
mainfrom
feat/r3-v3-routed-experts

S1ro1 commented May 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

S1ro1 commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR is ready - routed-experts vLLM pin rebuilt from v0.21.0 + revert #42434 + PR39568

Related PRs

Verification

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

S1ro1 commented May 13, 2026 •

edited

Loading